Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-38170659

RESUMEN

Human faces contain rich semantic information that could hardly be described without a large vocabulary and complex sentence patterns. However, most existing text-to-image synthesis methods could only generate meaningful results based on limited sentence templates with words contained in the training set, which heavily impairs the generalization ability of these models. In this paper, we define a novel 'free-style' text-to-face generation and manipulation problem, and propose an effective solution, named AnyFace++, which is applicable to a much wider range of open-world scenarios. The CLIP model is involved in AnyFace++ for learning an aligned language-vision feature space, which also expands the range of acceptable vocabulary as it is trained on a large-scale dataset. To further improve the granularity of semantic alignment between text and images, a memory module is incorporated to convert the description with arbitrary length, format, and modality into regularized latent embeddings representing discriminative attributes of the target face. Moreover, the diversity and semantic consistency of generation results are improved by a novel semi-supervised training scheme and a series of newly proposed objective functions. Compared to state-of-the-art methods, AnyFace++ is capable of synthesizing and manipulating face images based on more flexible descriptions and producing realistic images with higher diversity.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15120-15136, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37490385

RESUMEN

Occlusion is a common problem with biometric recognition in the wild. The generalization ability of CNNs greatly decreases due to the adverse effects of various occlusions. To this end, we propose a novel unified framework integrating the merits of both CNNs and graph models to overcome occlusion problems in biometric recognition, called multiscale dynamic graph representation (MS-DGR). More specifically, a group of deep features reflected on certain subregions is recrafted into a feature graph (FG). Each node inside the FG is deemed to characterize a specific local region of the input sample, and the edges imply the co-occurrence of non-occluded regions. By analyzing the similarities of the node representations and measuring the topological structures stored in the adjacent matrix, the proposed framework leverages dynamic graph matching to judiciously discard the nodes corresponding to the occluded parts. The multiscale strategy is further incorporated to attain more diverse nodes representing regions of various sizes. Furthermore, the proposed framework exhibits a more illustrative and reasonable inference by showing the paired nodes. Extensive experiments demonstrate the superiority of the proposed framework, which boosts the accuracy in both natural and occlusion-simulated cases by a large margin compared with that of baseline methods.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14590-14610, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37494159

RESUMEN

Facial Attribute Manipulation (FAM) aims to aesthetically modify a given face image to render desired attributes, which has received significant attention due to its broad practical applications ranging from digital entertainment to biometric forensics. In the last decade, with the remarkable success of Generative Adversarial Networks (GANs) in synthesizing realistic images, numerous GAN-based models have been proposed to solve FAM with various problem formulation approaches and guiding information representations. This paper presents a comprehensive survey of GAN-based FAM methods with a focus on summarizing their principal motivations and technical details. The main contents of this survey include: (i) an introduction to the research background and basic concepts related to FAM, (ii) a systematic review of GAN-based FAM methods in three main categories, and (iii) an in-depth discussion of important properties of FAM methods, open issues, and future research directions. This survey not only builds a good starting point for researchers new to this field but also serves as a reference for the vision community.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12287-12303, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37126625

RESUMEN

We present PyMAF-X, a regression-based approach to recovering a parametric full-body model from a single image. This task is very challenging since minor parametric deviation may lead to noticeable misalignment between the estimated mesh and the input image. Moreover, when integrating part-specific estimations into the full-body model, existing solutions tend to either degrade the alignment or produce unnatural wrist poses. To address these issues, we propose a Pyramidal Mesh Alignment Feedback (PyMAF) loop in our regression network for well-aligned human mesh recovery and extend it as PyMAF-X for the recovery of expressive full-body models. The core idea of PyMAF is to leverage a feature pyramid and rectify the predicted parameters explicitly based on the mesh-image alignment status. Specifically, given the currently predicted parameters, mesh-aligned evidence will be extracted from finer-resolution features accordingly and fed back for parameter rectification. To enhance the alignment perception, an auxiliary dense supervision is employed to provide mesh-image correspondence guidance while spatial alignment attention is introduced to enable the awareness of the global contexts for our network. When extending PyMAF for full-body mesh recovery, an adaptive integration strategy is proposed in PyMAF-X to produce natural wrist poses while maintaining the well-aligned performance of the part-specific estimations. The efficacy of our approach is validated on several benchmark datasets for body, hand, face, and full-body mesh recovery, where PyMAF and PyMAF-X effectively improve the mesh-image alignment and achieve new The project page with code and video results can be found at https://www.liuyebin.com/pymaf-x.

5.
Artículo en Inglés | MEDLINE | ID: mdl-37018302

RESUMEN

Clinical management and accurate disease diagnosis are evolving from qualitative stage to the quantitative stage, particularly at the cellular level. However, the manual process of histopathological analysis is lab-intensive and time-consuming. Meanwhile, the accuracy is limited by the experience of the pathologist. Therefore, deep learning-empowered computer-aided diagnosis (CAD) is emerging as an important topic in digital pathology to streamline the standard process of automatic tissue analysis. Automated accurate nucleus segmentation can not only help pathologists make more accurate diagnosis, save time and labor, but also achieve consistent and efficient diagnosis results. However, nucleus segmentation is susceptible to staining variation, uneven nucleus intensity, background noises, and nucleus tissue differences in biopsy specimens. To solve these problems, we propose Deep Attention Integrated Networks (DAINets), which mainly built on self-attention based spatial attention module and channel attention module. In addition, we also introduce a feature fusion branch to fuse high-level representations with low-level features for multi-scale perception, and employ the mark-based watershed algorithm to refine the predicted segmentation maps. Furthermore, in the testing phase, we design Individual Color Normalization (ICN) to settle the dyeing variation problem in specimens. Quantitative evaluations on the multi-organ nucleus dataset indicate the priority of our automated nucleus segmentation framework.

6.
IEEE Trans Med Imaging ; 42(4): 1159-1171, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36423314

RESUMEN

With the development of deep convolutional neural networks, medical image segmentation has achieved a series of breakthroughs in recent years. However, high-performance convolutional neural networks always mean numerous parameters and high computation costs, which will hinder the applications in resource-limited medical scenarios. Meanwhile, the scarceness of large-scale annotated medical image datasets further impedes the application of high-performance networks. To tackle these problems, we propose Graph Flow, a comprehensive knowledge distillation framework, for both network-efficiency and annotation-efficiency medical image segmentation. Specifically, the Graph Flow Distillation transfers the essence of cross-layer variations from a well-trained cumbersome teacher network to a non-trained compact student network. In addition, an unsupervised Paraphraser Module is integrated to purify the knowledge of the teacher, which is also beneficial for the training stabilization. Furthermore, we build a unified distillation framework by integrating the adversarial distillation and the vanilla logits distillation, which can further refine the final predictions of the compact network. With different teacher networks (traditional convolutional architecture or prevalent transformer architecture) and student networks, we conduct extensive experiments on four medical image datasets with different modalities (Gastric Cancer, Synapse, BUSI, and CVC-ClinicDB). We demonstrate the prominent ability of our method on these datasets, which achieves competitive performances. Moreover, we demonstrate the effectiveness of our Graph Flow through a novel semi-supervised paradigm for dual efficient medical image segmentation. Our code will be available at Graph Flow.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación
7.
IEEE Trans Image Process ; 31: 4651-4662, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35786554

RESUMEN

One major issue that challenges person re-identification (Re-ID) is the ubiquitous occlusion over the captured persons. There are two main challenges for the occluded person Re-ID problem, i.e. , the interference of noise during feature matching and the loss of pedestrian information brought by the occlusions. In this paper, we propose a new approach called Feature Recovery Transformer (FRT) to address the two challenges simultaneously, which mainly consists of visibility graph matching and feature recovery transformer. To reduce the interference of the noise during feature matching, we mainly focus on visible regions that appear in both images and develop a visibility graph to calculate the similarity. In terms of the second challenge, based on the developed graph similarity, for each query image, we propose a recovery transformer that exploits the feature sets of its k -nearest neighbors in the gallery to recover the complete features. Extensive experiments across different person Re-ID datasets, including occluded, partial and holistic datasets, demonstrate the effectiveness of FRT. Specifically, FRT significantly outperforms state-of-the-art results by at least 6.2% Rank- 1 accuracy and 7.2% mAP scores on the challenging Occluded-Duke dataset.


Asunto(s)
Identificación Biométrica , Peatones , Identificación Biométrica/métodos , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático
8.
IEEE Trans Pattern Anal Mach Intell ; 44(5): 2610-2627, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-33270560

RESUMEN

Reconstructing 3D human shape and pose from monocular images is challenging despite the promising results achieved by the most recent learning-based methods. The commonly occurred misalignment comes from the facts that the mapping from images to the model space is highly non-linear and the rotation-based pose representation of the body model is prone to result in the drift of joint positions. In this work, we investigate learning 3D human shape and pose from dense correspondences of body parts and propose a Decompose-and-aggregate Network (DaNet) to address these issues. DaNet adopts the dense correspondence maps, which densely build a bridge between 2D pixels and 3D vertexes, as intermediate representations to facilitate the learning of 2D-to-3D mapping. The prediction modules of DaNet are decomposed into one global stream and multiple local streams to enable global and fine-grained perceptions for the shape and pose predictions, respectively. Messages from local streams are further aggregated to enhance the robust prediction of the rotation-based poses, where a position-aided rotation feature refinement strategy is proposed to exploit spatial relationships between body joints. Moreover, a Part-based Dropout (PartDrop) strategy is introduced to drop out dense information from intermediate representations during training, encouraging the network to focus on more complementary body parts as well as neighboring position features. The efficacy of the proposed method is validated on both indoor and real-world datasets including Human3.6M, UP3D, COCO, and 3DPW, showing that our method could significantly improve the reconstruction performance in comparison with previous state-of-the-art methods. Our code is publicly available at https://hongwenzhang.github.io/dense2mesh.


Asunto(s)
Cuerpo Humano , Imagenología Tridimensional , Algoritmos , Humanos , Imagenología Tridimensional/métodos
9.
Artículo en Inglés | MEDLINE | ID: mdl-37015555

RESUMEN

Recent studies of video action recognition can be classified into two categories: the appearance-based methods and the pose-based methods. The appearance-based methods generally cannot model temporal dynamics of large motion well by virtue of optical flow estimation, while the pose-based methods ignore the visual context information such as typical scenes and objects, which are also important cues for action understanding. In this paper, we tackle these problems by proposing a Pose-Appearance Relational Network (PARNet), which models the correlation between human pose and image appearance, and combines the benefits of these two modalities to improve the robustness towards unconstrained real-world videos. There are three network streams in our model, namely pose stream, appearance stream and relation stream. For the pose stream, a Temporal Multi-Pose RNN module is constructed to obtain the dynamic representations through temporal modeling of 2D poses. For the appearance stream, a Spatial Appearance CNN module is employed to extract the global appearance representation of the video sequence. For the relation stream, a Pose-Aware RNN module is built to connect pose and appearance streams by modelling action-sensitive visual context information. Through jointly optimizing the three modules, PARNet achieves superior performances compared with the state-of-the-arts on both the pose-complete datasets (KTH, Penn-Action, UCF11) and the challenging pose-incomplete datasets (UCF101, HMDB51, JHMDB), demonstrating its robustness towards complex environments and noisy skeletons. Its effectiveness on NTU-RGBD dataset is also validated even compared with 3D skeleton-based methods. Furthermore, an appearance-enhanced PARNet equipped with a RGB-based I3D stream is proposed, which outperforms the Kinetics pre-trained competitors on UCF101 and HMDB51. The better experimental results verify the potentials of our framework by integrating various modules.

10.
IEEE Trans Pattern Anal Mach Intell ; 42(5): 1025-1037, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-31880541

RESUMEN

Near infrared-visible (NIR-VIS) heterogeneous face recognition refers to the process of matching NIR to VIS face images. Current heterogeneous methods try to extend VIS face recognition methods to the NIR spectrum by synthesizing VIS images from NIR images. However, due to the self-occlusion and sensing gap, NIR face images lose some visible lighting contents so that they are always incomplete compared to VIS face images. This paper models high-resolution heterogeneous face synthesis as a complementary combination of two components: a texture inpainting component and a pose correction component. The inpainting component synthesizes and inpaints VIS image textures from NIR image textures. The correction component maps any pose in NIR images to a frontal pose in VIS images, resulting in paired NIR and VIS textures. A warping procedure is developed to integrate the two components into an end-to-end deep network. A fine-grained discriminator and a wavelet-based discriminator are designed to improve visual quality. A novel 3D-based pose correction loss, two adversarial losses, and a pixel loss are imposed to ensure synthesis results. We demonstrate that by attaching the correction component, we can simplify heterogeneous face synthesis from one-to-many unpaired image translation to one-to-one paired image translation, and minimize the spectral and pose discrepancy during heterogeneous recognition. Extensive experimental results show that our network not only generates high-resolution VIS face images but also facilitates the accuracy improvement of heterogeneous face recognition.


Asunto(s)
Reconocimiento Facial Automatizado/métodos , Espectroscopía Infrarroja Corta/métodos , Bases de Datos Factuales , Cara/anatomía & histología , Cara/diagnóstico por imagen , Humanos , Aprendizaje Automático Supervisado
11.
Artículo en Inglés | MEDLINE | ID: mdl-31567089

RESUMEN

Binocular stereo vision (SV) has been widely used to reconstruct the depth information, but it is quite vulnerable to scenes with strong occlusions. As an emerging computational photography technology, light-field (LF) imaging brings about a novel solution to passive depth perception by recording multiple angular views in a single exposure. In this paper, we explore binocular SV and LF imaging to form the binocular-LF imaging system. An imaging theory is derived by modeling the imaging process and analyzing disparity properties based on the geometrical optics theory. Then an accurate occlusion-robust depth estimation algorithm is proposed by exploiting multibaseline stereo matching cues and defocus cues. The occlusions caused by binocular SV and LF imaging are detected and handled to eliminate the matching ambiguities and outliers. Finally, we develop a binocular-LF database and capture realworld scenes by our binocular-LF system to test the accuracy and robustness. The experimental results demonstrate that the proposed algorithm definitely recovers high quality depth maps with smooth surfaces and precise geometric shapes, which tackles the drawbacks of binocular SV and LF imaging simultaneously.

12.
Artículo en Inglés | MEDLINE | ID: mdl-31021767

RESUMEN

Regression based methods have revolutionized 2D landmark localization with the exploitation of deep neural networks and massive annotated datasets in the wild. However, it remains challenging for 3D landmark localization due to the lack of annotated datasets and the ambiguous nature of landmarks under 3D perspective. This paper revisits regression based methods and proposes an adversarial voxel and coordinate regression framework for 2D and 3D facial landmark localization in real-world scenarios. First, a semantic volumetric representation is introduced to encode the per-voxel likelihood of positions being the 3D landmarks. Then, an end-to-end pipeline is designed to jointly regress the proposed volumetric representation and the coordinate vector. Such a pipeline not only enhances the robustness and accuracy of the predictions but also unifies the 2D and 3D landmark localization so that 2D and 3D datasets could be utilized simultaneously. Further, an adversarial learning strategy is exploited to distill 3D structure learned from synthetic datasets to real-world datasets under weakly supervised settings, where an auxiliary regression discriminator is proposed to encourage the network to produce plausible predictions for both synthetic and real-world images. The effectiveness of our method is validated on benchmark datasets 3DFAW and AFLW2000-3D for both 2D and 3D facial landmark localization tasks. Experimental results show that the proposed method achieves significant improvements over previous state-of-the-art methods.

13.
IEEE Trans Pattern Anal Mach Intell ; 41(5): 1027-1042, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-29993436

RESUMEN

Unsupervised domain adaptation aims to leverage the labeled source data to learn with the unlabeled target data. Previous trandusctive methods tackle it by iteratively seeking a low-dimensional projection to extract the invariant features and obtaining the pseudo target labels via building a classifier on source data. However, they merely concentrate on minimizing the cross-domain distribution divergence, while ignoring the intra-domain structure especially for the target domain. Even after projection, possible risk factors like imbalanced data distribution may still hinder the performance of target label inference. In this paper, we propose a simple yet effective domain-invariant projection ensemble approach to tackle these two issues together. Specifically, we seek the optimal projection via a novel relaxed domain-irrelevant clustering-promoting term that jointly bridges the cross-domain semantic gap and increases the intra-class compactness in both domains. To further enhance the target label inference, we first develop a 'sampling-and-fusion' framework, under which multiple projections are independently learned based on various randomized coupled domain subsets. Subsequently, aggregating models such as majority voting are utilized to leverage multiple projections and classify unlabeled target data. Extensive experimental results on six visual benchmarks including object, face, and digit images, demonstrate that the proposed methods gain remarkable margins over state-of-the-art unsupervised domain adaptation methods.

14.
IEEE Trans Pattern Anal Mach Intell ; 41(7): 1761-1773, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-29993534

RESUMEN

Heterogeneous face recognition (HFR) aims at matching facial images acquired from different sensing modalities with mission-critical applications in forensics, security and commercial sectors. However, HFR presents more challenging issues than traditional face recognition because of the large intra-class variation among heterogeneous face images and the limited availability of training samples of cross-modality face image pairs. This paper proposes the novel Wasserstein convolutional neural network (WCNN) approach for learning invariant features between near-infrared (NIR) and visual (VIS) face images (i.e., NIR-VIS face recognition). The low-level layers of the WCNN are trained with widely available face images in the VIS spectrum, and the high-level layer is divided into three parts: the NIR layer, the VIS layer and the NIR-VIS shared layer. The first two layers aim at learning modality-specific features, and the NIR-VIS shared layer is designed to learn a modality-invariant feature subspace. The Wasserstein distance is introduced into the NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions. W-CNN learning is performed to minimize the Wasserstein distance between the NIR distribution and the VIS distribution for invariant deep feature representations of heterogeneous face images. To avoid the over-fitting problem on small-scale heterogeneous face data, a correlation prior is introduced on the fully-connected WCNN layers to reduce the size of the parameter space. This prior is implemented by a low-rank constraint in an end-to-end network. The joint formulation leads to an alternating minimization for deep feature representation at the training stage and an efficient computation for heterogeneous data at the testing stage. Extensive experiments using three challenging NIR-VIS face recognition databases demonstrate the superiority of the WCNN method over state-of-the-art methods.


Asunto(s)
Identificación Biométrica/métodos , Cara/anatomía & histología , Redes Neurales de la Computación , Espectroscopía Infrarroja Corta/métodos , Algoritmos , Bases de Datos Factuales , Expresión Facial , Humanos , Procesamiento de Imagen Asistido por Computador/métodos
15.
Artículo en Inglés | MEDLINE | ID: mdl-30582539

RESUMEN

Hashing has attracted increasing attention due to its tremendous potential for efficient image retrieval and data storage. Compared with conventional hashing methods with a handcrafted feature, emerging deep hashing approaches employ deep neural networks to learn feature representations as well as hash functions, which have already been proved to be more powerful and robust in real-world applications. Currently, most of the existing deep hashing methods construct pairwise or triplet-wise constraint to obtain similar binary codes between similar data pair or relative similar binary codes within a triplet. However, some critical local structures of the data are lack of exploiting, thus the effectiveness of hash learning is not fully shown. To address this limitation, we propose a novel deep hashing method named local semantic-aware deep hashing with Hamming-isometric quantization (LSDH), where local similarity of the data is intentionally integrated into hash learning. Specifically, in the Hamming space, we exploit the potential semantic relation of the data to robustly preserve their local similarity. In addition to reducing the error introduced by binary quantizing, we further develop a Hamming-isometric objective to maximize the consistency of similarity between the pairwise binary-like feature and its binary codes pair, which is shown to be able to enhance the quality of binary codes. Extensive experimental results on several benchmark datasets, including three singlelabel datasets (i.e., CIFAR-10, CIFAR-20, and SUN397) and one multi-label dataset (NUS-WIDE), demonstrate that the proposed LSDH achieves superior performance over the latest state-of-theart hashing methods.

16.
Artículo en Inglés | MEDLINE | ID: mdl-30235130

RESUMEN

Partial face recognition (PFR) in an unconstrained environment is a very important task, especially in situations where partial face images are likely to be captured due to occlusions, out-of-view, and large viewing angle, e.g., video surveillance and mobile devices. However, little attention has been paid to PFR so far and thus, the problem of recognizing an arbitrary patch of a face image remains largely unsolved. This study proposes a novel partial face recognition approach, called Dynamic Feature Matching (DFM), which combines Fully Convolutional Networks (FCNs) and Sparse Representation Classification (SRC) to address partial face recognition problem regardless of various face sizes. DFM does not require prior position information of partial faces against a holistic face. By sharing computation, the feature maps are calculated from the entire input image once, which yields a significant speedup. Experimental results demonstrate the effectiveness and advantages of DFM in comparison with state-of-the-art PFR methods on several partial face databases, including CAISA-NIR-Distance, CASIA-NIR-Mobile, and LFW databases. The performance of DFM is also impressive in partial person re-identification on Partial RE-ID and iLIDS databases. The source code of DFM can be found at https://github.com/lingxiao-he/dfm new.

17.
IEEE Trans Image Process ; 27(9): 4274-4286, 2018 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-29870347

RESUMEN

The low spatial resolution of light-field image poses significant difficulties in exploiting its advantage. To mitigate the dependency of accurate depth or disparity information as priors for light-field image super-resolution, we propose an implicitly multi-scale fusion scheme to accumulate contextual information from multiple scales for super-resolution reconstruction. The implicitly multi-scale fusion scheme is then incorporated into bidirectional recurrent convolutional neural network, which aims to iteratively model spatial relations between horizontally or vertically adjacent sub-aperture images of light-field data. Within the network, the recurrent convolutions are modified to be more effective and flexible in modeling the spatial correlations between neighboring views. A horizontal sub-network and a vertical sub-network of the same network structure are ensembled for final outputs via stacked generalization. Experimental results on synthetic and real-world data sets demonstrate that the proposed method outperforms other state-of-the-art methods by a large margin in peak signal-to-noise ratio and gray-scale structural similarity indexes, which also achieves superior quality for human visual systems. Furthermore, the proposed method can enhance the performance of light field applications such as depth estimation.

18.
IEEE Trans Pattern Anal Mach Intell ; 40(2): 332-351, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-28212078

RESUMEN

Biometrics is the technique of automatically recognizing individuals based on their biological or behavioral characteristics. Various biometric traits have been introduced and widely investigated, including fingerprint, iris, face, voice, palmprint, gait and so forth. Apart from identity, biometric data may convey various other personal information, covering affect, age, gender, race, accent, handedness, height, weight, etc. Among these, analysis of demographics (age, gender, and race) has received tremendous attention owing to its wide real-world applications, with significant efforts devoted and great progress achieved. This survey first presents biometric demographic analysis from the standpoint of human perception, then provides a comprehensive overview of state-of-the-art advances in automated estimation from both academia and industry. Despite these advances, a number of challenging issues continue to inhibit its full potential. We second discuss these open problems, and finally provide an outlook into the future of this very active field of research by sharing some promising opportunities.


Asunto(s)
Identificación Biométrica/métodos , Demografía/métodos , Adolescente , Adulto , Algoritmos , Electrocardiografía , Cara/diagnóstico por imagen , Femenino , Humanos , Iris/diagnóstico por imagen , Aprendizaje Automático , Masculino , Persona de Mediana Edad , Grupos Raciales/clasificación , Análisis para Determinación del Sexo , Adulto Joven
19.
IEEE Trans Neural Netw Learn Syst ; 29(3): 608-617, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-28055923

RESUMEN

Data-dependent hashing has recently attracted attention due to being able to support efficient retrieval and storage of high-dimensional data, such as documents, images, and videos. In this paper, we propose a novel learning-based hashing method called "supervised discrete hashing with relaxation" (SDHR) based on "supervised discrete hashing" (SDH). SDH uses ordinary least squares regression and traditional zero-one matrix encoding of class label information as the regression target (code words), thus fixing the regression target. In SDHR, the regression target is instead optimized. The optimized regression target matrix satisfies a large margin constraint for correct classification of each example. Compared with SDH, which uses the traditional zero-one matrix, SDHR utilizes the learned regression target matrix and, therefore, more accurately measures the classification error of the regression model and is more flexible. As expected, SDHR generally outperforms SDH. Experimental results on two large-scale image data sets (CIFAR-10 and MNIST) and a large-scale and challenging face data set (FRGC) demonstrate the effectiveness and efficiency of SDHR.

20.
IEEE Trans Pattern Anal Mach Intell ; 40(2): 490-496, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-28287956

RESUMEN

Learning-based hashing algorithms are "hot topics" because they can greatly increase the scale at which existing methods operate. In this paper, we propose a new learning-based hashing method called "fast supervised discrete hashing" (FSDH) based on "supervised discrete hashing" (SDH). Regressing the training examples (or hash code) to the corresponding class labels is widely used in ordinary least squares regression. Rather than adopting this method, FSDH uses a very simple yet effective regression of the class labels of training examples to the corresponding hash code to accelerate the algorithm. To the best of our knowledge, this strategy has not previously been used for hashing. Traditional SDH decomposes the optimization into three sub-problems, with the most critical sub-problem - discrete optimization for binary hash codes - solved using iterative discrete cyclic coordinate descent (DCC), which is time-consuming. However, FSDH has a closed-form solution and only requires a single rather than iterative hash code-solving step, which is highly efficient. Furthermore, FSDH is usually faster than SDH for solving the projection matrix for least squares regression, making FSDH generally faster than SDH. For example, our results show that FSDH is about 12-times faster than SDH when the number of hashing bits is 128 on the CIFAR-10 data base, and FSDH is about 151-times faster than FastHash when the number of hashing bits is 64 on the MNIST data-base. Our experimental results show that FSDH is not only fast, but also outperforms other comparative methods.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...